Explore no more: Improved high-probability regret bounds for non-stochastic bandits

نویسنده

Gergely Neu

چکیده

This work addresses the problem of regret minimization in non-stochastic multi-armed bandit problems, focusing on performance guarantees that hold with high probability. Such results are rather scarce in the literature since proving them requires a large deal of technical effort and significant modifications to the standard, more intuitive algorithms that come only with guarantees that hold on expectation. One of these modifications is forcing the learner to sample the losses of every arm at least Ω( √ T ) times over T rounds, which can adversely affect performance if many of the arms are obviously suboptimal. While it is widely conjectured that this property is essential for proving high-probability regret bounds, we show in this paper that it is possible to achieve such strong results without this undesirable exploration component. Our result relies on a simple and intuitive loss-estimation strategy called Implicit eXploration (IX) that allows a remarkably clean analysis. To demonstrate the flexibility of our technique, we derive several improved high-probability bounds for various extensions of the standard multi-armed bandit framework. Finally, we conduct a simple experiment that illustrates the robustness of our implicit exploration technique.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits

We present an algorithm that achieves almost optimal pseudo-regret bounds against adversarial and stochastic bandits. Against adversarial bandits the pseudo-regret is O ( K √ n log n ) and against stochastic bandits the pseudo-regret is O ( ∑ i(log n)/∆i). We also show that no algorithm with O (log n) pseudo-regret against stochastic bandits can achieve Õ ( √ n) expected regret against adaptive...

متن کامل

Improved Algorithms for Linear Stochastic Bandits

We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi-armed bandit problem and the linear stochastic multi-armed bandit problem. In particular, we show that a simple modification of Auer’s UCB algorithm (Auer, 2002) achieves with high probability constant regret. More importantly, we modify and, consequently, improve the analysis of the algorithm f...

متن کامل

Unimodal Bandits without Smoothness

We consider stochastic bandit problems with a continuum set of arms and where the expected re-ward is a continuous and unimodal function of the arm. No further assumption is made regarding thesmoothness and the structure of the expected reward function. We propose Stochastic Pentachotomy(SP), an algorithm for which we derive finite-time regret upper bounds. In particular, we sho...

متن کامل

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits A Proofs of Main Theorems

A Proofs of Main Theorems A.1 Proof of Lemma 1 Let Rt = R(At, wt) be the stochastic regret of CombUCB1 at time t, where At and wt are the solution and the weights of the items at time t, respectively. Furthermore, let Et = 9e 2 E : w̄(e) ŵTt 1(e)(e) ct 1,Tt 1(e) be the event that w̄(e) is outside of the high-probability confidence interval around ŵTt 1(e)(e) for some item e at time t; and let Et ...

متن کامل

Conservative Bandits

We study a novel multi-armed bandit problem that models the challenge faced by a company wishing to explore new strategies to maximize revenue whilst simultaneously maintaining their revenue above a fixed baseline, uniformly over time. While previous work addressed the problem under the weaker requirement of maintaining the revenue constraint only at a given fixed time in the future, the algori...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

نویسنده

چکیده

منابع مشابه

An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits

Improved Algorithms for Linear Stochastic Bandits

Unimodal Bandits without Smoothness

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits A Proofs of Main Theorems

Conservative Bandits

عنوان ژورنال:

اشتراک گذاری